Random walks on temporal networks 



(N 

o 

(N 



m 






c3 
•4— > 



I 

C 

o 
o 



> 

(N 

en 

o 

> 

X 



Michele Starnini, 1 Andrea Baronchelli, 2 Alain Barrat, 3,4 and Romualdo Pastor-Satorras 1 

1 Departament de Fisica i Enginyeria Nuclear, Universitat Politecnica de Catalunya, Campus Nord B4, 08034 Barcelona, Spain 

2 Department of Physics, College of Computer and Information Sciences, 
Bouve College of Health Sciences, Northeastern University, Boston MA02120, USA 

3 Centre de Physique Theorique, Aix-Marseille Unw, CNRS UMR 7332, 

Univ Sud Toulon Var, 13288 Marseille cedex 9, France 

4 Data Science Laboratory, ISI Foundation, Torino, Italy 

(Dated: February 26, 2013) 

Many natural and artificial networks evolve in time. Nodes and connections appear and disappear 
at various timescales, and their dynamics has profound consequences for any processes in which they 
are involved. The first empirical analysis of the temporal patterns characterizing dynamic networks 
are still recent, so that many questions remain open. Here, we study how random walks, as paradigm 
of dynamical processes, unfold on temporally evolving networks. To this aim, we use empirical 
dynamical networks of contacts between individuals, and characterize the fundamental quantities 
that impact any general process taking place upon them. Furthermore, we introduce different 
randomizing strategies that allow us to single out the role of the different properties of the empirical 
networks. We show that the random walk exploration is slower on temporal networks than it is on 
the aggregate projected network, even when the time is properly rescaled. In particular, we point 
out that a fundamental role is played by the temporal correlations between consecutive contacts 
present in the data. Finally, we address the consequences of the intrinsically limited duration of 
many real world dynamical networks. Considering the fundamental prototypical role of the random 
walk process, we believe that these results could help to shed light on the behavior of more complex 
dynamics on temporally evolving networks. 

PACS numbers: 05.40.Fb, 89.75.Hc, 89.75.-k 



I. INTRODUCTION 

Many real networks are dynamic structures in which 
connections appear, disappear, or are rewired on various 
timescales p]. For example, the links representing social 
relationships in social networks [5] are a static representa- 
tion of a succession of contact or communication events, 
which are constantly created or terminated between pairs 
of individuals (actors). Such temporal evolution is an in- 
trinsic feature of many natural and artificial networks, 
and can have profound consequences for the dynamical 
processes taking place upon them. Until recently how- 
ever, a large majority of studies about complex networks 
have focused on a static or aggregated representation, in 
which all the links that appeared at least once coexist. 
This is the case, for example, in the seminal works on sci- 
entific collaboration networks 3 , or on movie costarring 
networks [3]. In particular, dynamical processes have 
mainly been studied on static complex networks [5]. 

In recent years, the interest towards the temporal di- 
mension of the network description has blossomed. Em- 
pirical analyses have revealed rich and complex patterns 
of dynamic evolution p] IMPo] , pointing out the need to 
characterize and model them [5] [TBHT5] . At the same 
time, researchers have started to study how the temporal 
evolution of the network substrate impacts the behavior 
of dynamical processes such as epidemic spreading |T3j- 
IT51I2DH22J. synchronization [53], percolation PHHI and 
social consensus [55] . 

Here, we focus on the dynamics of a random walker 



exploring a temporal network 26-28] . The random walk 
is indeed the simplest diffusion model, and its dynam- 
ics provides fundamental hints to understand the whole 
class of diffusive processes on networks. Moreover, it 
has relevant applications in such contexts as spreading 
dynamics (i.e. virus or opinion spreading) and search- 
ing. For instance, assuming that each vertex knows only 
about the information stored in each of its nearest neigh- 
bors, the most naive economical strategy is the random 
walk search, in which the source vertex sends one mes- 
sage to a randomly selected nearest neighbor [S] [55] [3U] . 
If that vertex has the information requested, it retrieves 
it; otherwise, it sends a message to one of its nearest 
neighbors, until the message arrives to its finally target 
destination. Thus, the random walk represents a lower 
bound on the effects of searching in the absence of any 
information in the network, apart form the purely local 
information about the contacts at a given instant of time. 

In our study, we consider as typical examples of tem- 
poral networks the dynamical sequences of contact be- 
tween individuals in various social contexts, as recorded 
by the SocioPatterns project [TO] [31]. These datasets 
contain indeed the time-resolved patterns of face-to-face 
co-presence of individuals in settings such as conferences, 
with high temporal resolution: for each contact between 
individuals, the starting and ending times are registered 
by the measuring infrastructure, giving access to the tim- 
ing and duration of contacts. 

The paper is structured as follows. In Sec [H] we re- 
view some of the fundamental results for random walks 
on static networks. In Sec. |III| we describe the empiri- 



cal dynamical networks considered: we recall some basic 
definitions, present an analysis of the datasets, and intro- 
duce suitable randomization procedures, which will help 
later on to pinpoint the role of the correlations in the 
real data. In Sec. |IV| we write down mean-field equations 
for the case of maximally randomized dynamical contact 
networks, and in Sec. [Vlwe investigate the random walk 
dynamics numerically, focusing on the exploration prop- 
erties and on the mean first passage times. Sec. 



VI 



devoted to the analysis of the impact of the finite tem- 
poral duration of real time series. Finally, we summarize 
our results and comment on some perspectives in Sec|VH| 



II. A SHORT OVERVIEW OF RANDOM 
WALKS ON STATIC NETWORKS 

The random walk (RW) process is defined by a walker 
that, located on a given vertex i at time t, hops to a 
nearest neighbor vertex j at time t + 1. 

In binary networks, defined by the adjacency matrix 



ajj such that a^ 



1 is j is a neighbor of i, and a. 







else, the transition probability at each time step from i 
to j is 



Pb(i -> j) = 
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where fc,; = y"V a,ij is the degree of vertex i: the walker 
hops to a nearest neighbor of i, chosen uniformly at ran- 
dom among the fcj neighbors, hence with probability 1/ki 
(note that we consider here undirected networks with 
dij = dji, but the process can be considered as well on 
directed networks). In weighted networks with a weight 
matrix ojy, the transition probability takes instead the 
form 



Pw{i -tj) 
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where S{ — ^ 0Jij is the strength of vertex i 



(2) 



Here 



the walker chooses a nearest neighbor with probability 
proportional to the weight of the corresponding connect- 
ing edge. 

The basic quantity characterizing random walks in net- 
works is the occupation probability pi, defined as the 
steady state probability (i.e., measured in the infinite 
time limit) that the walker occupies the vertex i, or in 
other words, the steady state probability that the walker 
will land on vertex i after a jump from any other vertex. 
Following rigorous master equation arguments, it is pos- 
sible to show that the occupation probability takes the 

form nana 



"'/'. 



p, 



(fc)JV' 



Pi 



w 



(3) 



respectively in binary and weighted networks. 

Other characteristic properties of the random walk, 
relevant to the properties of searching in networks, are 



the mean first-passage time (MFPT) Tj and the coverage 
C{t) (26H2H]. The MFPT of a node i is defined as the 
average time taken by the random walker to arrive for 
the first time at i, starting from a random initial posi- 
tion in the network. This definition gives the number of 
messages that have to be exchanged, on average, in order 
to find vertex i. The coverage C(t), on the other hand, is 
defined as the number of different vertices that have been 
visited by the walker at time t, averaged for different ran- 
dom walks starting from different sources. The coverage 
can thus be interpreted as the searching efficiency of the 
network, measuring the number of different individuals 
that can be reached from an arbitrary origin in a given 
number of time steps. 

At a mean-field level, these quantities are computed as 
follows: let us define Pf(i;t) as the probability for the 
walker to arrive for the first time at vertex i in f time 
steps. Since in the steady state i is reached in a jump 
with probability pi, we have P/(i; t) = [1 — pi] t ~ 1 pi. The 
MFPT to vertex i can thus be estimated as the average 
n = yj t £-P/(*;£)) leading to 
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On the other hand, we can define the random walk reach- 
ability of vertex i, P r (i; t), as the probability that vertex 
i is visited by a random walk starting at an arbitrary ori- 
gin, at any time less than or equal to t. The reachability 
takes the form 



P r (i; t) = 1 - [1 - p^ ~ 1 - exp(-tpi 



(5) 



where the last expression is valid in the limit of suffi- 
ciently small pi . The coverage of a random walk at time 
t will thus be given by the sum of these probabilities, i.e. 



N 
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exp (-tpi 
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For sufficiently small prf, the exponential in Eq. (pi) can 
be expanded to yield C(t) ~ t, a linear coverage implying 
that at the initial stages of the walk, a different vertex is 
visited at each time step, independently of the network 
properties [551155] , 

It is now important to note that the random walk pro- 
cess has been defined here in a way such that the walker 
performs a move and changes node at each time step, po- 
tentially exploring a new node: except in the pathological 
case of a random walk starting on an isolated node, the 
walker has always a way to move out of the node it occu- 
pies. In the context of temporal networks, on the other 
hand, the walker might arrive at a node i that at the suc- 
cessive time step becomes isolated, and therefore has to 
remain trapped on that node until a new link involving 
i occurs. In order to compare in a meaningful way ran- 
dom walk processes on static and dynamical networks, 
and on different dynamical networks, we consider in each 



dynamical network the average probability p that a node 
has at least one link. The walker is then expected to 
move on average once every k time steps, so that we will 
consider the properties of the random walk process on 
dynamical networks as a function of the rescaled time pt. 



ILL EMPIRICAL DYNAMICAL NETWORKS 
A. Basics on temporal networks 

Dynamical or temporal networks [I] are properly rep- 
resented in terms of a contact sequence, representing the 
contacts (edges) as a function of time: a set of triplets 
(i,j,t) where i and j are interacting at time t, with 
t = {1,...,T}, where T is the total duration of the 
contact sequence. The contact sequence can thus be ex- 
pressed in terms of a characteristic function (or temporal 
adjacency matrix [37]) x(i, j, t), taking the value 1 when 
actors i and j are connected at time t, and zero otherwise. 

Coarse-grained information about the structure of 
dynamical networks can be obtained by projecting 
them onto aggregated static networks, cither binary or 
weighted. The binary projected network informs of the 
total number of contacts of any given actor, while its 
weighted version carries additional information on the to- 
tal time spent in interactions by each actor p] HI ETJ [38] . 
The aggregated binary network is defined by an adja- 
cency matrix of the form 



efexCU,*)] 



(7) 



where Q(x) is the Heaviside theta function defined by 
Q(x) = 1 if x > and Q(x) — if x < 0. In this repre- 
sentation, the degree of vertex i, ki = J2j a iji represents 
the number of different agents with whom agent i has in- 
teracted. The associated weighted network, on the other 
hand, has weights of the form 



^ ^2x(i,3,t)- 



(8) 



Here, w^- represents the number of interactions between 
agents i and j, normalized by its maximum possible 
value, i. e. the total duration of the contact sequence 
T. The strength of vertex i, Si = ^ ^ij, represents the 
average number of interactions of agent i at each time 
step. 

While static projections represent a first step in the 
understanding of the properties of dynamical networks, 
they coarse-grain a great deal of information from the em- 
pirical time series, a fact that can be particularly relevant 
when considering dynamical processes running on top of 
dynamical networks [21]. At a basic topological level, 
projected networks disregard the fact that dynamics on 
temporal networks are in general restricted to follow time 
respecting paths [3 \7\ HH HU [Ml HO] , meaning that if 



a contact between vertices i and j took place at times 

Tij = {t\j , t\j , • • • , t\j}, it cannot be used in the course 
of a dynamical processes at any time t ^ Tij. Therefore, 
not all the network is available for propagating a dynam- 
ics that starts at any given node, but only those nodes 
belonging to its set of influence [7] , defined as the set of 
nodes that can be reached from a given one, following 
time respecting paths. Moreover, an important role can 
also be played by the bursty nature of dynamical and so- 
cial processes, where the appearance and disappearance 
of links do not follow a Poisson processes, but show in- 
stead long tails in the distribution of link presence and 
absence durations, as well as long range correlations in 
the times of successive link occurrences [HI [101 HH EI] ■ 



B. Empirical contact sequences 

The temporal networks used in the present study de- 
scribe the sequences of face-to-face contact between in- 
dividuals recorded by the SocioPatterns collaboration 
[TTJl [3Tj : in the deployments of the SocioPatterns in- 
frastructure, each individual wears a badge equipped 
with an active radio-frequency identification (RFID) 
device. These devices engage in bidirectional radio- 
communication at very low power when they are close 
enough, and relay the information about the proximity 
of other devices to RFID readers installed in the envi- 
ronment. The devices properties are tuned so that face- 
to-face proximity (1-2 meters) of individuals wearing the 
tags on their chests can be assessed with a temporal res- 
olution of 20 seconds (Ato = 20 seconds represents thus 
the elementary time interval that can be considered). 

We consider here datasets describing the face-to-face 
proximity of individuals gathered in several different 
social contexts: the European Semantic Web Confer- 
ence ("eswc"), the Hypertext conference ("lit"), the 25th 
Chaos Communication Congress ("25c3") R and a pri- 
mary school ( "school" ) . A description of the correspond- 
ing contexts and various analyses of the corresponding 
datasets can be found in Refs [TO ] |2" H 155 ) 132 ] . 

In Table[l]we summarize the main average properties of 
the datasets we are considering, that are of interest in the 
context of walks on dynamical networks. In particular, 
we focus on: 

• N: number of different individuals engaged in in- 
teractions; 

• T: total duration of the contact sequence, in units 
of the elementary time interval At = 20 seconds; 

• (&) = J2i h/N: average degree of nodes in the pro- 
jected binary network, aggregated over the whole 
dataset; 



1 In this particular case, the proximity detection range extended 
to 4-5 meters and packet exchange between devices was not nec- 
essarily linked to face-to-face proximity. 



Dataset 


N 


T 


(k) 


25c3 


569 


7450 


185 


eswc 


173 


4703 


50 


ht 


113 


5093 


39 


school 


242 


3100 


69 



_p f_ 

0.215 256 
0.059 7 
0.060 4 

0.235 41 



n 


At c 


(») 


91 


2.82 


0.90 


2.8 


2.41 


0.079 


1.9 


2.13 


0.072 


25 


1.63 


0.34 



Table I. Some average properties of the datasets under con- 
sideration. 



• p = ^}2 t p{t)/T: average number of individuals p(t) 
interacting at each time step; 

•7 = EtE(t)/T = EijtX(i,J,t)/2T: mean fre- 
quency of the interactions, defined as the average 
number of edges E(t) of the instantaneous network 
at time t; 

• n = ^2 t n(i)/2T: average number of new conversa- 
tions n(t) starting at each time step; 

• (At c ): average duration of a contact. 

• ( s ) = J2i g i/N: average strength of nodes in the 
projected weighted network, defined as the mean 
number of interactions per agent at each time step, 
averaged over all agents. 

Table [I] shows the heterogeneity of the considered 
datasets, in terms of size, overall duration and contact 
densities. In particular, while the dataset 25c3 shows a 
high density of interactions (high p, f and n) , and con- 
sequently a large average degree and average strength, 
the others are sparser. Moreover, as also shown in the 
deployments timelines in [TO], some of the datasets show 
large periods of low activity, followed by bursty peaks 
with a lot of contacts in few time steps, while others 
present more regular interactions between elements. In 
this respect, it is worth noting that we will not consider 
those portions of the datasets with very low activity, in 
which only few couples of elements interact, such as the 
beginning or ending part of conferences or the nocturnal 
periods. 

The heterogeneity and burstiness of the contact pat- 
terns of the face-to-face interactions [10] are revealed by 
the study of the distribution of the duration At of con- 
tacts between pairs of agents, P(At), the distribution of 
the total time in contact of pairs of agents (the weight 
distribution P(w)), and the distribution of gap times, r, 
between two consecutive conversations involving a com- 
mon individual and two other different agents, for a single 
agent i, Pi(r), or considering all the agents, P(t). All 
these distributions are heavy-tailed, typically compatible 
with power-law behaviors (see Fig. n]), corresponding to 
the burstiness of human interactions [41 . 

As noted above, diffusion processes such as random 
walks arc moreover particularly impacted by the struc- 
ture of paths between nodes. In this respect, time re- 
specting paths represent a crucial feature of any temporal 
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Figure 1. (Color Online) Distributions of P(At) (duration of 
contacts), -P(w) (total contact time between pairs of agents), 
Pi(r) (gap times of a single individual i) and P(t) (global 
gap times). In the case of Pi(j), we only plot the gap times 
distribution of the agent which engages in the largest number 
of conversation, but the other agents exhibit a similar behav- 
ior. All distributions are heavy-tailed, indicating the bursty 
nature of face-to-face interactions, for the four empirical con- 
tact sequences considered. 



network, since they determine the set of possible causal 
interactions between the actors of the graph. 

For each (ordered) pair of nodes (i,j), time-respecting 
paths from i to j can either exist or not; moreover, the 
concept of shortest path on static networks (i.e., the path 
with the minimum number of links between two nodes) 
yields several possible generalizations in a temporal net- 
work: 

• the fastest path is the one that allows to go from 
i to j, starting from the dataset initial time, in 
the minimum possible time, independently of the 
number of intermediate steps; 

• the shortest time-respecting path between i and j is 
the one that corresponds to the smallest number of 
intermediate steps, independently of the time spent 
between the start from i and the arrival to j. 

For each node pair (i,j), we denote by l[ .-, l^ emp , 

l^f a the lengths (in terms of the number of hops) respec- 
tively of the fastest path, the shortest time-respecting 
path, and the shortest path on the aggregated network, 
and by At{j and Atfj the duration of the fastest and 
shortest time-respecting paths, where we take as initial 
time the first appearance of i in the dataset. As already 
noted in other works [3TJ E3] , \a can be much larger than 

is.stat ti j 'i • i ,i i if \ js.tejnp \ is,stat 

«„,,- . Moreover, it is clear that IU > LI > t„ ; 

from the duration point of view, on the contrary, At- < 



Dataset 


le 


(I.) 


(At.) 


(h) 


(At/) 


\l>s ,stat) 


25c3 


0.91 


1.67 


1607 


4.7 


893 


1.67 


eswc 


0.99 


1.75 


884 


4.95 


287 


1.73 


ht 


0.99 


1.67 


1157 


3.86 


452 


1.66 


school 


1 


1.76 


853 


8.27 


349 


1.73 



Table II. (Color Online) Average properties of the shortest 
time-respecting paths, fastest paths and shortest paths in the 
projected network, in the datasets considered. 



At 8 -. 

We therefore define the following quantities: 

• l e : fraction of the N(N — 1) ordered pairs of nodes 
for which a time-respecting path exists; 

• (/ s ): average length (in terms of number of hops 
along network links) of the shortest time-respecting 
paths; 

• (At s ): average duration of the shortest time- 
respecting paths; 

• (If): average length of the fastest time-respecting 
paths; 

• (Atf): average duration of the fastest time- 
respecting paths; 

• (Is, stat}' average shortest path length in the binary 
(static) projected network; 

The corresponding empirical values are reported in Ta- 
ble [n] It turns out that the great majority of pairs of 
nodes are causally connected by at least one path in all 
datasets. Hence, almost every node can potentially be 
influenced by any other actor during the time evolution, 
i.e., the set of sources and the set of influence of the great 
majority of the elements are almost complete (of size N) 
in all of the considered datasets. 

In Fig. [2] we show the distributions of the lengths, 
P(l s ), and durations, P(At s ), of the shortest time- 
respecting path for different datasets. In the same Fig- 
ure we choose one dataset to compare the P(l s ) and 
the P(Ats) distributions with the distributions of the 
lengths, P(lf), and durations, P(Atf), of the fastest 
path. The P(l s ) distribution is short tailed and peaked 
on I = 2, with a small average value (l s ), even consid- 
ering the relatively small sizes N of the datasets, and 
it is very similar to the projected network one (l St stat) 
(see Table ITT]). The P(lf) distribution, on the contrary, 
shows a smooth behavior, with an average value (If) 
several times bigger than the shortest path one, (l s ), 
as expected [HJ [33] . Note that, despite the important 
differences in the datasets characteristics, the P(l s ) dis- 
tributions (as well as P(lf), although not shown) col- 
lapse, once rescaled. On the other hand, the P(At s ) and 
P(Atf) distributions show the same broad-tailed behav- 
ior, but the average duration (At s ) of the shortest paths 
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Figure 2. (Color Online) Top: Distribution of the temporal 
duration of the shortest time-respecting paths, normalized by 
its maximum value T. Inset: probability distribution P(l s ) of 
the shortest path length measured over time-respecting paths, 
and normalized with its mean value (l s ). Note that the dif- 
ferent datasets collapse. Bottom: Probability distribution of 
the duration of the shortest P(At s ) and fastest P(Atf) time- 
respecting paths, for the eswc dataset. Inset: Probability dis- 
tribution of the shortest P(l s ) and fastest P(lf) path length 
for the same dataset. 



is much longer than the average duration (Atf) of the 
fastest paths, and of the same order of magnitude than 
the total duration of the contact sequence T. 

Thus, a temporal network may be topologically well 
connected and at the same time difficult to navigate or 
search. Indeed spreading and searching processes need 
to follow paths whose properties are determined by the 
temporal dynamics of the network, and that might be 
cither very long or very slow. 



C. Synthetic extensions of empirical contact 
sequences 

The empirical contact sequences represent the proper 
dynamical network substrate upon which the properties 
of any dynamical process should be studied. In many 
cases however, the finite duration of empirical datasets 
is not sufficient to allow these processes to reach their 
asymptotic state [T3J |H] . This issue is particularly im- 



portant in processes that reach a steady state, such as 
random walks. As discussed in Sec. [TTl a walker does not 
move at every time step, but only with a probability p, 
and the effective number of movements of a walker is of 
the order Tp. For the considered empirical sequences, 
this means that the ratio between the number of hops 
of the walker and the network size, Tp/N , assumes val- 
ues between 3.01 for the school case and 1.60 for the 
eswc case. Typically, for a random walk processes such 
small times permit to observe transient effects only, but 
not a stationary behavior. Therefore we will first explore 
the asymptotic properties of random walks in syntheti- 
cally extended contact sequences, and we will consider 
the corresponding finite time effects in Sec. |Vl] The syn- 
thetic extensions preserve at different levels the statistical 
properties observed in the real data, thus providing null 
models of dynamical networks. 

Inspired by previous approaches to the synthetic ex- 
tension of empirical contact sequences [TJ El [T3J [521 EI] > 
we consider the following procedures: 

• SRep: Sequence replication. The contact sequence 
is repeated periodically, defining a new extended 
characteristic function such that Xe Rep (h j, i) = 
x(i, j, t mod T). This extension preserves all of the 
statistical properties of the empirical data (obvi- 
ously, when properly rescaled to take into account 
the different durations of the extended and empiri- 
cal time series), introducing only small corrections, 
at the topological level, on the distribution of time 
respecting paths and the associated sets of influ- 
ence of each node. Indeed, a contact present at 
time t will be again available to a dynamical pro- 
cess starting at time t' > t after a time t + T. 

• SRan: Sequence randomization. The time order- 
ing of the interactions is randomized, by construct- 
ing a new characteristic function such that, at each 

time step t, xf' R<m (*\ji*) = xihjrf) ^* an d Vj, 
where t ' is a time chosen uniformly at random from 
the set {1,2,..., T}. This form of extension yields 
at each time step an empirical instantaneous net- 
work of interactions, and preserves on average all 
the characteristics of the projected weighted net- 
work, but destroys the temporal correlations of suc- 
cessive contacts, leading to Poisson distributions 
for P(At) and P^t). 

• SStat: Statistically extended sequence. An inter- 
mediate level of randomization can be achieved by 
generating a synthetic contact sequence as follows: 
we consider the set of all conversations c(i,j,At) 
in the sequence, defined as a series of consecutive 
contacts of length At between the pair of agents i 
and j. The new sequence is generated, at each time 
step t, by choosing n conversations (n being the av- 
erage number of new conversations starting at each 
time step in the original sequence, see Table M, ran- 
domly selected from the set of conversations, and 
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Figure 3. (Color Online) Top: Probability distribution Pi(r) 
of a single individual and P(At) (inset) for the extended con- 
tact sequences SRep, SRan and SStat, for the 25c3 dataset. 
The weight distribution P(w) of the original contact sequence 
is preserved for every extension. Bottom: Probability distri- 
bution of gap times P(t) for all the agents in the SRep, SRan 
and SStat extensions of the 25c3 dataset. 



considering them as starting at time t and ending 
at time t + At, where At is the duration of the 
corresponding conversation. In this procedure we 
avoid choosing conversations between agents i and 
j which are already engaged in a contact started at 
a previous time t' < t. This extension preserves all 
the statistical properties of the empirical contact 
sequence, with the exception of the distribution of 
time gaps between consecutive conversations of a 
single individual, Pi(r). 

In Fig. [3] we plot the distribution of the duration of 
contacts, P(At), and the distribution of gap times be- 
tween two consecutive conversations realized by a sin- 
gle individual, P(j~i), for the extended contact sequences 
SRep, SRan and SStat. One can check that the SRep ex- 
tension preserves all the P(w), P{At) and Pi(r) distribu- 
tions of the original contact sequence, the SRan extension 
preserves only P(w) and the SStat extension preserves 
both the P(w) and the P(At) but not the Pi(r), as sum- 
marized in Table [TTT] Interestingly, we note that the dis- 
tribution of gap times for all agents, P(t), is also broadly 
distributed in the SRan and SStat extensions, despite 



Extension 


P(w) 


P(At) 


Pi(r) 


SRep 


/ 


/ 


/ 


SRan 


/ 


/ 


X 


SStat 


/ 


/ 


X 



Table III. Comparison of the properties of the original contact 
sequence preserved in the synthetic extensions. 



the fact that the respective individual burstiness Pi(r) 
are bounded, see Fig. [3] This fact can be easily under- 
stood by considering that P(r) can be written in terms 
of a convolution of the individual gap distributions times 
the probability of starting a conversation. In the case of 
SRan extension, the probability r^ that an agent i starts 
a new conversation is proportional to its strength Sj, i.e. 
Ti = Si/(N(s)). Therefore, the probability that it starts a 
conversation r time steps after the last one (its gap distri- 
bution) is given by P^t) = r^l - nY^ 1 ~ rjexp(-rrj), 
for sufficiently small r^. The gap distribution for all 
agents P(t) is thus given by the convolution 



P(r)= J P{s) 



N 



exp — t 



N{s) 



ds , 



(9) 



where P(s) is the strength distribution. This distribution 
has an exponential form, which leads, from Eq. (f9j) , to a 

total gap distribution P(t) ~ (1 + t/N)~ 2 , with a heavy 
tail. Analogous arguments can be used in the case of the 
SStat extension. 



in which successive contacts are by construction uncor- 
rected. Considering that the random walker is in vertex 
i at time t, at a subsequent time step it will be able 
to jump to a vertex j whenever a connection between i 
and j is created, and a connection between i and j will 
be chosen with probability proportional to the number 
of connections between i and j in the original contact 
sequence, i.e. proportional to u>ij. That is, a random 
walk on the extended SRan sequence behaves essentially 
as in the corresponding weighted projected network, and 
therefore the equations obtained in Sec. ITTJ namely 



(s)N 



and 



cm 

N 



1 -jfT, e ^(- t 



(s)N 



(12) 



(13) 



apply. In this last expression for the coverage we can 
approximate the sum by an integral, i.e. 



cm 

N 



-l-/*P(.)«p(-« 



(s)Nj ' 



(14) 



being P(s) the distribution of strengths. Giving that 
P(s) has an exponential behavior, we can obtain from 
the last expression 



N 



1-1 



t 

N 



(15) 



IV. RANDOM WALKS ON EXTENDED 
CONTACT SEQUENCES 



NUMERICAL SIMULATIONS 



Let us consider a random walk on the sequence of in- 
stantaneous networks at discrete time steps, which is 
equivalent to a message passing strategy in which the 
message is passed to a randomly chosen neighbor. The 
walker present at node i at time t hops to one of its 
neighbors, randomly chosen from the set of vertices 



Vi(t) = {j I x(i,j, *) = !}, 



of which there is a number 



h(t) =53x(*»J)*)) 



(10) 



(11) 



In this Section we present numerical results from the 
simulation of random walks on the extended contact se- 
quences described above. Measuring the coverage C(i) 
we set the duration of these sequences to 50 times the 
duration of the original contact sequence T, while to 
evaluate the MFPT between two nodes i and j, Tij, we 
let the RW explore the network up to a maximum time 
tmax = 10 8 . Each result we report is averaged over at 
least 10 3 independent runs. 



A. Network exploration 



If the node i is isolated at time t, i.e. Vj(t) = 0, the 
walker remains at node i. In any case, time is increased 
t-tt + 1. 

Analytical considerations analogous to those in Sec. |ll] 
for the case of contact sequences are hampered by the 
presence of time correlations between contacts. In fact, 
as we have seen, the contacts between a given pair of 
agents are neither fixed nor completely random, but in- 
stead show long range temporal correlations. An excep- 
tion is represented by the randomized SRan extension, 



The network coverage C(t) describes the fraction of 
nodes that the walker has discovered up to time t. Fig- 
ure [l] shows the normalized coverage C{t)/N as a func- 
tion of time, averaged for different walks starting from 
different sources, for the dynamical networks obtained 
using the SRep, SRan and SStat prescriptions. Time is 
rescaled as t — > pt to take into account that the walker 
can find itself on an isolated vertex, as discussed before. 
While for SRep and SRan extensions the average num- 
ber of interacting nodes p is by construction the same as 
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Figure 4. (Color Online) Normalized coverage C(t)/N as a 
function of the rescaled time pt/N, for the SRep, SRan and 
SStat extension of empirical data. The numerical evaluation 
of Eq. ( 13 1 is shown as a dashed line, and each panel in the 



figure corresponds to one of the empirical datasets considered. 
The exploration of the empirical repeated data sets (SRep) is 
slower than the other cases. Moreover, the SRan is in agree- 
ment with the theoretical prediction, and the SStat case shows 
a close (but systematically slower) behavior. This indicates 
that the main slowing down factor in the SRep sequence is 
represented by the irregular distribution of the interactions 
in time, whose contribution is eliminated in the randomized 
sequences. 



in the original contact sequence, for the SStat extension 
we obtain numerically different values of p, which we use 
when rescaling time in the corresponding simulations. 

The coverage corresponding to the SRan extension is 
very well fitted by a numerical simulation of Eq. (151, 



which predicts the coverage C(t)/N obtained in the cor- 
respondent projected weighted network. Moreover, when 
using the rescaled time pt, the SRan coverages for differ- 
ent datasets collapse on top of each other for small times, 
with a linear time dependence C(t)/N ~ t/N for t <C N 
as expected in static networks, showing a universal be- 
havior (not shown). 

The coverage obtained on the SStat extension is sys- 
tematically smaller than in the SRan case, but follows 
a similar evolution. On the other hand, the RW explo- 
ration obtained with the SRep prescription is generally 
slower than the other two, particularly for the 25c3 and 
ht datasets. As discussed before, the original contact 
sequence, as well as the SRep extension, are character- 
ized by irregular distributions of the interactions in time, 
showing periods with few interacting nodes and corre- 
spondingly a small number n(t) of new started conver- 
sations, followed by peaks with many interactions (see 
Fig. [5|. This feature slows down the RW exploration, 
because the RW may remain trapped for long times on 




2000 



Figure 5. (Color Online) Number of new conversations n(t) 
started per unit time in the SRep (black, full dots), SRan 
(red, empty squares) and SStat (green, diamonds) extensions 
of the school dataset. 



isolated nodes. The SRan and the SStat extensions, on 
the contrary, both destroy this kind of temporal struc- 
ture, balancing the periods of low and high activity: the 
SRan extension randomizes the time order of the contact 
sequence, and the SStat extension evens the number of 
interacting nodes, with n new conversations starting at 
each time step. 

The similarity between the random walk processes on 
the SRan and SStat dynamical networks shows that the 
random walk coverage is not very sensitive to the het- 
erogenous durations of the conversations, as the main 
difference between these two cases is that P(At) is nar- 
row for SRan and broad for SStat. In these cases, the ob- 



served behavior is instead well accounted for by Eq. (13 1, 



taking into account only the weight distribution of the 
projected network, i.e., the heterogeneity between aggre- 
gated conversation durations. Therefore, the slower ex- 
ploration properties of the SRep sequences can be mostly 
attributed to the correlations between consecutive con- 
versations of the single individuals, as given by the indi- 
vidual gap distribution Pi(r), (see [I31[T31|22] for analo- 
gous results in the context of epidemic spreading) . 

A remark is in order for the 25c3 conference. A close 
inspection of Fig. [4] shows that the RW does not reach 
the whole network in any of the extensions schemes, with 
C m ax < 0.85, although the duration of the simulation 
is quite long pt max > 10 2 iV. The reason is that this 
dataset contains a group of nodes (around 20% of the 
total) with a very low strength Si, meaning that there 
are actors who are isolated for most of the time, and 
whose interactions are reduced to one or two contacts in 
the whole contact sequence. Given that each extension 
we use preserves the P(w) distribution, the discovery of 
these nodes is very difficult. The consequence is that we 
observe an extremely slow approach to the asymptotic 
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Figure 7. (Color Online) Rescaled mean first passage time 
Ti, shown against the strength Si, normalized with the total 
strength N{s), for the SRep, SRan and SStat extensions of 
empirical data. The dashed line represents the prediction of 
Eq. (12 1. Each panel in the figure corresponds to one of the 



empirical datasets considered. 



1+pt/N 



B. Mean first-passage time 



Figure 6. (Color Online) Asymptotic residual coverage 1 — 
C(t)/N as a function of pt/N for the SRep (top) and SRan 
(bottom) extended sequences, for different datasets. 



value lirrif^co -jM = 1. Indeed, the mean-field calcu- 
lations presented in Sees. [TT] and |III C| suggest a power- 
law decay with (1 + pt/N)^ 1 for the residual coverage 
1 — C(t)/N. In Fig. [6] we plot the asymptotic coverage for 
large times in the 4 datasets considered. We can see that 
RW on the eswc and ht dataset conform at large times 
quite reasonably to the expected theoretical prediction in 
Eq. (15 1, both for the SRep and SRan extensions. The 
25c3 dataset shows, as discussed above, a considerable 



slowing down, with a very slow decay in time. Interest- 
ingly, the school dataset is much faster than all the rest, 
with a decay of the residual coverage 1 — C(t)/N exhibit- 
ing an approximate exponential decay. It is noteworthy 
that the plots for the randomized SRan sequence do not 
always obey the mean-field prediction (see lower plot in 
Fig. [6]). This deviation can be attributed to the fact 
that SRan extensions preserve the topological structure 
of the projected weighted network, and it is known that, 
in some instances, random walks on weighted networks 
can deviate from the mean-field predictions [45] • These 
deviations are particularly strong in the case of the 25c3 
dataset, where connections with a very small weight are 
present. 



Let us now focus on another important characteristic 
property of random walk processes, namely the MFPT 
defined in Section [TTJ. Figure [7] shows the correlation be- 
tween the MFPT ^ of each node, measured in units of 
rescaled time pt, and its normalized strength Si/(N(s)). 

The random walks performed on the SRan and SStat 
extensions are very well fitted by the mean field theory, 
i.e. Eq. ( 12 ) (predicting that t, is inversely proportional 
to Si), for every dataset considered; on the other hand, 
random walks on the extended sequence SRep yield at 
the same time deviations from the mean-field prediction 
and much stronger fluctuations around an average behav- 
ior. Figure [8] addresses this case in more detail, showing 
that the data corresponding to RW on different datasets 
collapse on an average behavior that can be fitted by a 
scaling function of the form 



P 



N(a) 



(16) 



with an exponent a ~ 0.75. 

These results show that the MFPT, similarly to the 
coverage, is rather insensitive to the distribution of the 
contact durations, as long as the distribution of cu- 
mulated contact durations between individuals is pre- 
served (the weights of the links in the projected net- 
work). Therefore, the deviations of the results obtained 
with the SRep extension of the empirical sequences have 
their origin in the burstiness of the contact patterns, as 
determined by the temporal correlations between consec- 
utive conversations. The exponent a < 1 means that the 
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Figure 8. (Color Online) Mean first passage time at node i, 
in units of rescaled time pt, vs. the strength s%, normalized 
with the total strength N(s), for RW processes on the SRep 
datasets extension. All data collapse close to the continuous 
line whose slope, a ~ 0.75, differs from the theoretical one, 
a = 1.0, shown as a dashed line. 



Figure 9. (Color Online) Normalized coverage C(t)/N as 
function of the rescaled time pt/N for the different datasets. 
The inset shows the probability distribution P(At new ) of the 
time lag At nem between the discovery of two new vertices. 
Only the discovery of the first 5% of the network is consid- 
ered, to avoid finite size effects |46j . 



searching process in the empirical, correlated, network 
is slower than in the randomized versions, in agreement 
with the smaller coverage observed in Fig. [4] 

The data collapse observed in Fig. [8] for the SRep 
case leads to two noticeable conclusions. First, although 
the various datasets studied correspond to different con- 
texts, with different numbers of individuals and densities 
of contacts, simple rescaling procedures are enough to 
compare the processes occurring on the different tempo- 
ral networks, at least for some given quantities. Second, 
the MFPT at a node is largely determined by its strength. 
This can indeed seem counterintuitive as the strength is 
an aggregated quantity (that may include contact events 
occurring at late times). However, it can be rationalized 
by observing that a large strength means a large num- 
ber of contacts and therefore a large probability to be 
reached by the random walker. Moreover, the fact that 
the strength of a node is an aggregate view of contact 
events that do not occur homogeneously for all nodes but 
in a bursty fashion leads to strong fluctuations around 
the average behavior, which implies that nodes with the 
same strength can also have rather different MFPT (Note 
the logarithmic scale on the y-axis). 



VI. RANDOM WALKS ON FINITE CONTACT 
SEQUENCES 

The case of finite sequences is interesting from the 
point of view of realistic searching processes. The limited 
duration of a human gathering, for example, imposes a 
constraint on the length of any searching strategy. Fig. 



[9] shows the normalized C(t)/N coverage as a function 
of the rescaled time pt/N. The coverage exhibits a con- 
siderable variability in the different datasets, which do 
not obey the rescaling obtained for the extended SRan 
and SStat sequence. The probability distribution of the 
time lags At new between the discovery of two new ver- 
tices [46] provides further evidence of the slowing down 
of diffusion in temporal networks. The inset of Fig. [9] 
indeed shows broad tailed distributions P{At new ) for all 
the dataset considered, differently from the exponential 
decay observed in binary static networks [46) . 

The important differences in the rescaled coverage 
C(t)/N between the various datasets, shown in Fig. [9l 
can be attributed to the choice of the time scale, pt/N, 
which corresponds to a temporal rescaling by an aver- 
age quantity. We can argue, indeed, that the speed 
with which new nodes are found by the RW is propor- 
tional to the number of new conversations n(t) started 
at each time step t, thus in the RW exploration of the 
temporal network the effective time scale is given by the 
integrated number of new co nversations up to time t, 
N(t) = L n(t')dt' . In Fig. 10 we display the correlation 



between the coverage C(t)/1\ and the number of new 
conversations realized up to time t, N(t), normalized for 
the mean number of new conversations per unit of time, 
n. While the relation is not strictly linear, a very strong 
positive correlation appears between the two quantities. 
The complex pattern shown by the average coverage 
C(t) originates from the lack of self-averaging in a dy- 
namic network. Figure [TT] shows the rank plot of the 
coverage Cj obtained at the end of a RW process start- 
ing from node i, and averaged over 10 3 runs. Clearly, not 
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Figure 10. (Color Online) Coverage C(t)/N as a function of 
the number of new conversation realized up to time t, nor- 
malized for the mean number of new conversation per unit of 
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Figure 11. (Color Online) Rank plot of the coverage C* ob- 
tained starting from node i in the contact sequence of duration 
T, averaged over 10 3 runs. In the inset, we show a rank plot 
of the coverage (7, (AT) up to a fixed time AT = 10 3 . 



all vertices are equivalent. A first explanation of the vari- 
ability in Ci comes from the fact that not all nodes appear 
simultaneously on the network at time 0. If io,i denotes 
the arrival time of node i in the system, a random walk 
starting from i is restricted to T[ = T — to,*: nodes arriv- 
ing at later times have less possibilities to explore their 
set of influence, even if this set includes all nodes. To 
put all nodes on equal footing and compensate for this 
somehow trivial difference between nodes, we consider 
the coverage of random walkers starting on the different 
vertices i and walking for exactly AT time steps (we limit 
of course the study to nodes with to,i < T — AT). Dif- 



Figure 12. (Color Online) Correlation between the probabil- 
ity of node i to be reached by the RW, P r (i), and the rescaled 
strength pTsi/N{s) for different datasets. The curves ob- 
tained by different dataset collapse, but they do not follow 



the mean-field behavior predicted by of Equation ( 17 \ (dashed 
line). The inset shows the same data on a linear scale, to em- 
phasize the deviation from mean-field. 



ferences in the coverage (7* (AT) will then depend on the 
intrinsic properties of the dynamic network. For a static 
network indeed, either binary or weighted, the coverage 
(7; (AT) would be independent of i, as random walkers 
on static networks lose the memory of their initial posi- 
tion in a few steps, reaching very fast the steady state 
behavior Eq. (pi). As the inset of Fig. 11 shows, impor- 



tant heterogeneities are instead observed in the coverage 
of random walkers starting from different nodes on the 
dynamic network, even if the random walk duration is 
the same. 

Another interesting quantity is the probability that a 
vertex i is discovered by the random walker. As discussed 
in Section |TIJ at the mean field level the probability that 
a node i is visited by the RW at any time less than or 
equal to t (the random walk reachability) takes the form 
P r (i; t) = 1 — exp[— tp{i)]. Thus the probability that the 
node i is reached by the RW at any time in the contact 
sequence is 



P r (i) = 1 — exp 



'N(s) 



(17) 



where the rescaled time pt is taken into account. In Fig. 
12 we plot the probability P r (i) of node i to be reached 
by the RW during the contact sequence as a function of 
its strength Si. P r {i) exhibits a clear increasing behavior 
with Si, larger strength corresponding to larger time in 
contact and therefore larger probabilities to be reached. 
Interestingly, the simple rescaling by p and (s) leads to 
an approximate data collapse for the RW processes on 
the various dynamical networks, showing a very robust 
behavior. Similarly to the case of the MFPT on extended 
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Figure 13. (Color Online) Correlation between the probability 
of node i to be reached by a RW of length T/2, P r {i), and 
the rescaled strength pTsi/N{s) for different datasets, where 
Si is computed on the whole dataset of length T. The inset 
shows the same data on a linear scale. 



sequences, the dynamical property P r (i) can be in part 
"predicted" by an aggregate quantity such as S{. Strong 
deviations from the mean-field prediction of Eq. ( 17) are 



however observed, with a tendency of P r (i) to saturate 
at large strengths to values much smaller than the ones 
obtained on a static network. Thus, although the set 
of sources of almost every node i has size N, as shown 
in Sec. IIIB (i.e., there exists a time respecting path 



between almost every possible starting point of the RW 
processes and every target node i), the probability for 
node i to be effectively reached by a RW is far from being 
equal to 1. 

Moreover, rather strong fluctuations of P r (i) at given 
Si are also observed: s, is indeed an aggregate view 
of contacts which are typically inhomogeneous in time, 
with bursty behaviort[j . Figure 13 also shows that the 
reachability computed at shorter time (here T/2) displays 
stronger fluctuations as a function of the strength Sj com- 
puted on the whole time sequence: P r {i) for shorter RW 
is naturally less correlated with an aggregate view which 
takes into account a more global behavior of i. 



VII. DISCUSSION AND CONCLUSIONS 

In this paper we have investigated the behavior of ran- 
dom walks on temporal networks. In particular, we have 
focused on real face-to-face contact networks concerning 
four different datasets. These dynamical networks ex- 
hibit heterogeneous and bursty behavior, indicated by 



2 When considering RW on a contact sequence of length T ran- 
domized according to the SRan procedure instead, Eq. (fX7p is 
well obeyed and only small fluctuations of P r (i) are observed at 
a fixed Sj (not shown). 



the long tailed distributions for the lengths and strength 
of conversations, as well as for the gaps separating suc- 
cessive interactions. We have underlined the importance 
of considering not only the existence of time preserving 
paths between pairs of nodes, but also their temporal du- 
ration: shortest paths can take much longer than fastest 
paths, while fastest paths can correspond to many more 
hops than shortest paths. Interestingly, the appropriate 
rescaling of these quantities identifies universal behaviors 
shared across the four datasets. 

Given the finite life-time of each network, we have con- 
sidered as substrate for the random walk process the 
replicated sequences in which the same time series of 
contact patterns is indefinitely repeated. At the same 
time, we have proposed two different randomization pro- 
cedures to investigate the effects of correlations in the 
real dataset. The "sequence randomization" (SRan) de- 
stroys any temporal correlation by randomizing the time 
ordering of the sequence. This allows to write down ex- 
act mean-field equations for the random walker explor- 
ing these networks, which turn out to be substantially 
equivalent to the ones describing the exploration of the 
weighted projected network. The "statistically extended 
sequence" (SStat), on the other hand, selects random 
conversations from the original sequence, thus preserv- 
ing the statistical properties of the original time series, 
with the exception of the distribution of time gaps be- 
tween consecutive conversations. 

We have performed numerical analysis both for the 
coverage and the MFPT properties of the random walker. 
In both cases we have found that the empirical sequences 
deviate systematically from the mean field prediction, in- 
ducing a slowing down of the network exploration and 
of the MFPT. Remarkably, the analysis of the random- 
ized sequences has allowed us to point out that this is 
due uniquely to the temporal correlations between con- 
secutive conversations present in the data, and not to 
the heterogeneity of their lengths. Finally, we have ad- 
dressed the role of the finite size of the empirical net- 
works, which turns out to prevent a full exploration of 
the random walker, though differences exist across the 
four considered cases. In this context, we have also shown 
that different starting nodes provide on average different 
coverages of the networks, at odds to what happens in 
static graphs. In the same way, the probability that the 
node i is reached by the RW at any time in the contact 
sequence exhibits a common behavior across the differ- 
ent time series, but it is not described by the mean-field 
predictions for the aggregated network, which predict a 
faster process. 

In conclusion, the contribution of our analysis is two- 
fold. On the one hand, we have proposed a general 
way to study dynamical processes on temporally evolv- 
ing networks, by the introduction of randomized bench- 
marks and the definition of appropriate quantities that 
characterize the network dynamics. On the other hand, 
for the specific, yet fundamental, case of the random 
walk, we have obtained detailed results that clarify the 
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observed dynamics, and that will represent a reference 
for the understanding of more complex diffusive dynam- 
ics occurring on dynamic networks. Our investigations 
also open interesting directions for future work. For in- 
stance, it would be interesting to investigate how random 
walks starting from different nodes explore first their own 
neighborhood [37] , which might lead to hints about the 
definition of "temporal communities" (see e.g. [IB] for 
an algorithm using RW on static networks for the detec- 
tion of static communities); various measures of nodes 
centrality have also been defined in temporal networks 
P3 HH H5H5T] , but their computation is rather heavy, and 
RW processes might present interesting alternatives, sim- 



ilarly to the case of static networks 
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